In [118]:
import json
import pandas as pd
import urllib.request
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np

Task 1: Identify one or more suitable web APIs

The API which I have chosen for this project is the Earthquake Catalog provided by the United States Geological Survey.

This API allows the user to create lots of different searches and queries from eqarthquake data collected from both the present day and the past. There are a vast array of different parameters which I used below in order to find out very specific information.

Fortunately, the information provided by the query function found in this API is outputted in CSV format. This made it very easy to manipulate the data using Pandas.

The API Documentation may be found here: https://earthquake.usgs.gov/fdsnws/event/1/

This API is free to use, and as such no API Key was needed.

Task 2: Collect data from your chosen API

Fortunately for myself, the data provided by this API comes in a very easy to work with format. I have chosen to use a CSV file for the collection of this data, as it can be very easily transferred to a panda dataframe through the use of Panda's inbuilt functions.

For this project, I will be working two data sets for the purpose of analysing them. The first such dataset is the earthquakes from 1st September 2019 to the 30th September 2019. This data will be collected from all of the earthquakes from the month, regardless how small. This will be a very large dataset, however it should give us an insight into certain metrics behind earthqaukes from location to magnitude to depth.

We will call this dataset - "monthset"

In [119]:
monthset = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2019-9-01&endtime=2019-9-30")
print("monthset; Input complete")
print("monthset; monthset contains: " + str(len(monthset)) + " entries")
monthset; Input complete
monthset; monthset contains: 15743 entries

The second dataset that we will introduce for this project is every earthquake in the last 20 years with a minimum magnitude of 5.5. This is due to the fact that according to http://www.geo.mtu.edu/UPSeis/magnitude.html, this is the threshhold where an earthquake may cause slight damage to buildings and other structures.

Unfortunately this had to be done in a somewhat crude manner as it had to be downloaded in seperate parts. The data is concatenated inside the loop which results in a full data set which we can use to analyse

In [120]:
# First we create initial
yearset = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=1999-09-01&endtime=" + str(2004)+"-08-31&minmagnitude=5.5")

for i in range(2009,2020,5):
    temp = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime="+str(i-5)+"-09-01&endtime=" + str(i)+"-08-31&minmagnitude=5.5")
    yearset = pd.concat([yearset,temp])
print("yearset; Input complete")
print("yearset; yearset contains: " + str(len(yearset)) + " entries")
yearset; Input complete
yearset; yearset contains: 10092 entries

We can then sort these values by time

In [121]:
yearset = yearset.sort_values(by='time')
In [122]:
print(yearset)
                          time  latitude  longitude   depth  mag magType  nst  \
2401  1999-09-07T11:56:49.380Z   38.1190    23.6050   10.00  6.0     mwc  NaN   
2400  1999-09-07T17:50:00.120Z  -63.2890  -166.5660   10.00  5.5     mwc  NaN   
2399  1999-09-08T00:29:00.800Z    7.1380   123.6780  117.90  5.5     mwc  NaN   
2398  1999-09-09T14:02:01.590Z   47.5060   154.3340   33.00  5.5     mwc  NaN   
2397  1999-09-10T08:45:23.160Z   46.0240   150.2600   91.20  5.8     mwc  NaN   
2396  1999-09-10T19:37:44.810Z  -32.8310  -178.2700   33.00  6.0     mwb  NaN   
2395  1999-09-12T03:03:18.500Z   28.9700   142.0530   33.00  5.9     mwb  NaN   
2394  1999-09-13T04:43:19.420Z   -3.6360   149.5150   56.70  5.6     mwc  NaN   
2393  1999-09-13T11:55:28.180Z   40.7090    30.0450   13.00  5.9     mwb  NaN   
2392  1999-09-14T04:01:25.900Z  -55.9890   -27.8090   33.00  5.5      mb  NaN   
2391  1999-09-14T22:17:24.430Z   15.0900   146.2230   91.50  5.5     mwc  NaN   
2390  1999-09-15T03:01:24.340Z  -20.9340   -67.2750  218.00  6.4     mwc  NaN   
2389  1999-09-17T14:54:48.720Z  -13.7900   167.2380  196.80  6.3     mwc  NaN   
2388  1999-09-17T23:48:11.200Z  -14.4080  -178.2100   33.00  5.5     mwc  NaN   
2387  1999-09-18T04:01:04.400Z   -4.4220  -104.4440   10.00  5.6     mwc  NaN   
2386  1999-09-18T06:50:58.120Z   -6.4430   147.7990   48.80  5.6     mwc  NaN   
2385  1999-09-18T21:28:33.170Z   51.2070   157.5560   60.00  6.0     mwc  NaN   
2384  1999-09-18T23:51:30.480Z  -19.7130   169.2050  102.80  5.9     mwc  NaN   
2383  1999-09-19T00:27:23.060Z   46.4190   153.3770   45.60  5.8     mwc  NaN   
2382  1999-09-19T03:18:54.570Z   -3.6240   150.8750  430.50  5.9     mwc  NaN   
2381  1999-09-20T09:32:42.710Z   46.3340   153.4630   33.00  5.5     mwc  NaN   
2380  1999-09-20T17:47:18.490Z   23.7720   120.9820   33.00  7.7     mwc  NaN   
2379  1999-09-20T17:47:32.000Z   23.6020   120.9660   25.00  6.3      mw  NaN   
2378  1999-09-20T17:57:16.090Z   23.7850   121.2020   33.00  6.1      mb  NaN   
2377  1999-09-20T18:03:44.290Z   23.5700   121.2990   33.00  6.3      mb  NaN   
2376  1999-09-20T18:11:53.650Z   23.7460   121.1890   33.00  6.1      mb  NaN   
2375  1999-09-20T18:16:18.510Z   23.7560   121.2460   33.00  6.2      mb  NaN   
2374  1999-09-20T21:46:42.870Z   23.3900   120.9640   33.00  6.4     mwc  NaN   
2373  1999-09-21T22:24:44.080Z  -63.6800  -167.1630   10.00  5.6     mwc  NaN   
2372  1999-09-22T00:14:39.150Z   23.7290   121.1670   26.00  6.4     mwc  NaN   
...                        ...       ...        ...     ...  ...     ...  ...   
29    2019-08-01T20:01:28.233Z  -34.2812   -72.3734   15.41  5.6     mww  NaN   
28    2019-08-02T05:50:55.239Z  -49.7307  -113.8331   10.00  6.0     mww  NaN   
27    2019-08-02T12:03:27.001Z   -7.2822   104.7907   49.00  6.9     mww  NaN   
26    2019-08-04T10:23:03.726Z   37.7594   141.6031   38.00  6.3     mww  NaN   
25    2019-08-04T11:44:21.302Z  -48.8150   163.7897   10.00  5.6     mww  NaN   
24    2019-08-05T00:40:46.024Z    1.0417   -27.8703   10.00  5.8     mww  NaN   
23    2019-08-05T09:01:00.670Z  -18.3793  -174.3733   10.00  5.7     mww  NaN   
22    2019-08-06T22:14:14.863Z  -17.9594   168.5844  150.00  5.9     mww  NaN   
21    2019-08-07T05:32:40.615Z  -15.4975   167.6555  124.00  5.9     mww  NaN   
20    2019-08-07T21:28:03.651Z   24.4782   121.9301   20.79  5.8     mww  NaN   
19    2019-08-08T00:45:26.713Z   36.5272    70.0571  226.00  5.8     mww  NaN   
18    2019-08-08T11:25:31.104Z   37.9350    29.7003   11.00  5.9     mww  NaN   
17    2019-08-12T20:39:34.864Z   15.9767   -93.7276   93.00  5.5     mww  NaN   
16    2019-08-14T21:35:18.451Z   20.5023  -109.2878   10.00  5.9     mww  NaN   
15    2019-08-15T10:38:32.175Z  -20.8068   173.4883   10.00  5.5     mww  NaN   
14    2019-08-18T18:00:25.437Z   16.6260   146.3156   17.92  5.9     mww  NaN   
13    2019-08-20T13:03:52.633Z  -11.3683   166.2912   37.00  6.0     mww  NaN   
12    2019-08-21T09:24:54.211Z   -6.0640   154.6474   64.00  5.5     mww  NaN   
11    2019-08-21T14:28:25.652Z  -50.3302   139.3240   10.00  6.0     mww  NaN   
10    2019-08-22T11:02:40.807Z   17.5608   145.4748  519.87  5.5     mww  NaN   
9     2019-08-22T19:27:11.919Z  -14.6668  -177.4135   10.00  5.9     mww  NaN   
8     2019-08-22T22:54:25.722Z  -12.5806   167.0624  206.89  5.6     mww  NaN   
7     2019-08-23T10:44:21.167Z  -11.5925   166.3936   43.00  5.5     mww  NaN   
6     2019-08-24T15:51:27.119Z  -14.3090   167.1897  115.00  6.0     mww  NaN   
5     2019-08-24T21:21:27.484Z  -20.1852  -175.6299  215.14  5.5     mww  NaN   
4     2019-08-27T23:55:19.187Z  -60.2152   -26.5801   16.00  6.6     mww  NaN   
3     2019-08-28T07:59:02.264Z  -60.3573   -26.5291   10.00  5.6     mww  NaN   
2     2019-08-28T08:00:45.169Z  -60.5584   -26.8518   10.00  5.5      mb  NaN   
1     2019-08-28T23:46:40.132Z   41.0682   142.9955   23.77  5.9     mww  NaN   
0     2019-08-29T15:07:58.646Z   43.5425  -127.8817   10.00  6.3     mww  NaN   

       gap    dmin   rms  ...                   updated  \
2401   NaN     NaN  1.12  ...  2017-04-26T17:56:11.568Z   
2400   NaN     NaN  1.05  ...  2016-11-09T23:25:59.659Z   
2399   NaN     NaN  1.20  ...  2016-11-09T23:26:27.635Z   
2398   NaN     NaN  0.73  ...  2016-11-09T23:26:51.661Z   
2397   NaN     NaN  0.91  ...  2016-11-09T23:27:20.409Z   
2396   NaN     NaN  1.19  ...  2016-11-09T23:27:21.963Z   
2395   NaN     NaN  1.05  ...  2016-11-09T23:28:16.558Z   
2394   NaN     NaN  1.23  ...  2016-11-09T23:28:51.220Z   
2393   NaN     NaN  0.93  ...  2017-04-26T17:56:12.131Z   
2392   NaN     NaN  0.91  ...  2014-11-07T01:08:28.627Z   
2391   NaN     NaN  1.04  ...  2016-11-09T23:29:21.788Z   
2390   NaN     NaN  0.98  ...  2016-11-10T00:46:00.955Z   
2389   NaN     NaN  0.99  ...  2016-11-10T00:46:11.907Z   
2388   NaN     NaN  0.88  ...  2016-11-09T23:30:48.669Z   
2387   NaN     NaN  0.79  ...  2016-11-09T23:31:08.471Z   
2386   NaN     NaN  1.05  ...  2016-11-09T23:31:08.960Z   
2385   NaN     NaN  0.89  ...  2016-11-10T00:46:18.060Z   
2384   NaN     NaN  0.96  ...  2016-11-09T23:31:10.949Z   
2383   NaN     NaN  1.12  ...  2016-11-09T23:31:31.870Z   
2382   NaN     NaN  0.89  ...  2016-11-09T23:31:32.370Z   
2381   NaN     NaN  1.06  ...  2016-11-09T23:32:07.384Z   
2380   NaN     NaN  1.11  ...  2018-10-17T17:10:51.799Z   
2379   NaN     NaN   NaN  ...  2015-05-13T18:53:41.000Z   
2378   NaN     NaN  0.83  ...  2017-04-26T17:56:13.826Z   
2377   NaN     NaN  1.42  ...  2017-04-26T17:56:14.387Z   
2376   NaN     NaN  0.83  ...  2017-04-26T17:56:14.965Z   
2375   NaN     NaN  0.90  ...  2017-04-26T17:56:15.526Z   
2374   NaN     NaN  1.41  ...  2017-04-13T22:11:58.889Z   
2373   NaN     NaN  1.20  ...  2016-11-09T23:32:31.200Z   
2372   NaN     NaN  0.89  ...  2017-04-26T17:56:17.214Z   
...    ...     ...   ...  ...                       ...   
29    80.0   0.886  0.63  ...  2019-10-19T21:42:04.040Z   
28    82.0  22.790  0.92  ...  2019-10-19T21:42:04.040Z   
27    23.0   2.839  0.76  ...  2019-10-19T21:42:05.040Z   
26    32.0   0.653  0.78  ...  2019-10-19T21:42:09.040Z   
25    92.0   3.296  1.19  ...  2019-10-19T21:42:09.040Z   
24    41.0  10.529  0.81  ...  2019-10-31T20:26:38.550Z   
23    48.0   4.270  1.22  ...  2019-10-19T21:42:13.040Z   
22    10.0   2.827  0.95  ...  2019-10-29T20:33:39.040Z   
21    34.0   0.439  0.78  ...  2019-10-29T20:33:40.040Z   
20    17.0   0.430  0.83  ...  2019-10-29T20:33:41.040Z   
19    12.0   1.627  0.74  ...  2019-10-29T20:33:42.040Z   
18    25.0   0.646  0.96  ...  2019-10-29T20:33:43.040Z   
17    34.0   1.558  1.11  ...  2019-10-29T20:34:09.040Z   
16    70.0   2.285  0.58  ...  2019-11-02T15:26:05.040Z   
15    98.0   5.278  1.17  ...  2019-11-02T15:26:07.040Z   
14    28.0   3.326  0.82  ...  2019-11-02T15:26:14.040Z   
13    42.0   4.150  0.99  ...  2019-11-03T15:25:50.959Z   
12    30.0   3.095  0.91  ...  2019-09-13T12:15:31.132Z   
11    67.0   9.234  0.66  ...  2019-09-15T08:02:41.369Z   
10    43.0   2.326  1.04  ...  2019-10-31T11:12:49.517Z   
9     39.0   0.772  1.06  ...  2019-11-05T20:59:45.192Z   
8     28.0   2.852  0.71  ...  2019-09-27T21:42:14.238Z   
7     59.0   3.911  0.69  ...  2019-09-26T07:48:23.560Z   
6     26.0   1.131  1.19  ...  2019-09-29T13:56:53.190Z   
5     30.0   6.310  0.92  ...  2019-09-07T23:16:18.757Z   
4     19.0  12.896  0.91  ...  2019-09-10T02:20:41.763Z   
3     54.0   8.124  0.74  ...  2019-10-15T13:05:04.010Z   
2     55.0   8.154  0.95  ...  2019-10-03T06:09:33.798Z   
1     30.0   0.953  0.69  ...  2019-09-07T03:04:08.262Z   
0     31.0   2.678  1.17  ...  2019-10-28T20:06:17.021Z   

                                                  place        type  \
2401                                             Greece  earthquake   
2400                            Pacific-Antarctic Ridge  earthquake   
2399                              Mindanao, Philippines  earthquake   
2398                                      Kuril Islands  earthquake   
2397                                      Kuril Islands  earthquake   
2396                      south of the Kermadec Islands  earthquake   
2395                        Bonin Islands, Japan region  earthquake   
2394                                       Bismarck Sea  earthquake   
2393                                     western Turkey  earthquake   
2392                      South Sandwich Islands region  earthquake   
2391            Saipan region, Northern Mariana Islands  earthquake   
2390                                    Potosi, Bolivia  earthquake   
2389                                            Vanuatu  earthquake   
2388                                        Fiji region  earthquake   
2387                          central East Pacific Rise  earthquake   
2386        eastern New Guinea region, Papua New Guinea  earthquake   
2385  near the east coast of the Kamchatka Peninsula...  earthquake   
2384                                            Vanuatu  earthquake   
2383                                      Kuril Islands  earthquake   
2382               New Ireland region, Papua New Guinea  earthquake   
2381                                      Kuril Islands  earthquake   
2380                                             Taiwan  earthquake   
2379                                             Taiwan  earthquake   
2378                                             Taiwan  earthquake   
2377                                             Taiwan  earthquake   
2376                                             Taiwan  earthquake   
2375                                             Taiwan  earthquake   
2374                                             Taiwan  earthquake   
2373                            Pacific-Antarctic Ridge  earthquake   
2372                                             Taiwan  earthquake   
...                                                 ...         ...   
29                       100km WNW of Santa Cruz, Chile  earthquake   
28                           Southern East Pacific Rise  earthquake   
27                   106km WSW of Tugu Hilir, Indonesia  earthquake   
26                             61km ENE of Namie, Japan  earthquake   
25             270km NW of Auckland Island, New Zealand  earthquake   
24                           Central Mid-Atlantic Ridge  earthquake   
23                             50km NW of Neiafu, Tonga  earthquake   
22                        37km SE of Port-Vila, Vanuatu  earthquake   
21                        52km E of Luganville, Vanuatu  earthquake   
20                            15km SSE of Su'ao, Taiwan  earthquake   
19                     18km ESE of Farkhar, Afghanistan  earthquake   
18                            9km ESE of Baklan, Turkey  earthquake   
17                           12km SSE of Tonala, Mexico  earthquake   
16                   262km NE of Socorro Island, Mexico  earthquake   
15                          154km NW of Ceva-i-Ra, Fiji  earthquake   
14       75km ENE of Anatahan, Northern Mariana Islands  earthquake   
13                     87km SE of Lata, Solomon Islands  earthquake   
12                96km WNW of Panguna, Papua New Guinea  earthquake   
11                       Western Indian-Antarctic Ridge  earthquake   
10          68km SSW of Pagan, Northern Mariana Islands  earthquake   
9                 90km ESE of Sigave, Wallis and Futuna  earthquake   
8                            153km NNW of Sola, Vanuatu  earthquake   
7                    114km SSE of Lata, Solomon Islands  earthquake   
6                              61km SW of Sola, Vanuatu  earthquake   
5                        114km NNW of Nuku`alofa, Tonga  earthquake   
4     131km S of Bristol Island, South Sandwich Islands  earthquake   
3     147km S of Bristol Island, South Sandwich Islands  earthquake   
2     170km S of Bristol Island, South Sandwich Islands  earthquake   
1                         141km ENE of Hachinohe, Japan  earthquake   
0                             285km W of Bandon, Oregon  earthquake   

     horizontalError depthError  magError  magNst     status  locationSource  \
2401             NaN        NaN       NaN     NaN   reviewed              us   
2400             NaN        NaN       NaN     NaN   reviewed              us   
2399             NaN       16.4       NaN     NaN   reviewed              us   
2398             NaN        NaN       NaN     NaN   reviewed              us   
2397             NaN        NaN       NaN     NaN   reviewed              us   
2396             NaN        NaN       NaN     NaN   reviewed              us   
2395             NaN        NaN       NaN     NaN   reviewed              us   
2394             NaN       13.0       NaN     NaN   reviewed              us   
2393             NaN        NaN       NaN     NaN   reviewed              us   
2392             NaN        NaN       NaN    10.0   reviewed              us   
2391             NaN        NaN       NaN     NaN   reviewed              us   
2390             NaN        NaN       NaN     NaN   reviewed              us   
2389             NaN        NaN       NaN     NaN   reviewed              us   
2388             NaN        NaN       NaN     NaN   reviewed              us   
2387             NaN        NaN       NaN     NaN   reviewed              us   
2386             NaN        6.6       NaN     NaN   reviewed              us   
2385             NaN        NaN       NaN     NaN   reviewed              us   
2384             NaN        NaN       NaN     NaN   reviewed              us   
2383             NaN       12.0       NaN     NaN   reviewed              us   
2382             NaN        NaN       NaN     NaN   reviewed              us   
2381             NaN        NaN       NaN     NaN   reviewed              us   
2380             NaN        NaN       NaN     NaN   reviewed              us   
2379             NaN        NaN       NaN     NaN  automatic          iscgem   
2378             NaN        NaN       NaN    84.0   reviewed              us   
2377             NaN        NaN       NaN    80.0   reviewed              us   
2376             NaN        NaN       NaN    99.0   reviewed              us   
2375             NaN        NaN       NaN    88.0   reviewed              us   
2374             NaN        NaN       NaN     NaN   reviewed              us   
2373             NaN        NaN       NaN     NaN   reviewed              us   
2372             NaN        NaN       NaN     NaN   reviewed              us   
...              ...        ...       ...     ...        ...             ...   
29               4.9        3.4     0.086    13.0   reviewed              us   
28               8.5        1.8     0.058    29.0   reviewed              us   
27               7.0        1.9     0.043    51.0   reviewed              us   
26               7.3        1.9     0.071    19.0   reviewed              us   
25               7.0        1.8     0.080    15.0   reviewed              us   
24               6.8        1.8     0.050    38.0   reviewed              us   
23               9.1        1.8     0.051    37.0   reviewed              us   
22               6.6        1.7     0.040    59.0   reviewed              us   
21               5.1        1.8     0.044    50.0   reviewed              us   
20               5.3        2.1     0.068    21.0   reviewed              us   
19               6.2        1.8     0.053    34.0   reviewed              us   
18               4.7        1.7     0.039    64.0   reviewed              us   
17               5.2        1.9     0.036    73.0   reviewed              us   
16               6.1        1.8     0.031    98.0   reviewed              us   
15               8.5        1.8     0.066    22.0   reviewed              us   
14               8.5        4.0     0.056    31.0   reviewed              us   
13               7.7        1.9     0.066    22.0   reviewed              us   
12               8.3        1.9     0.063    24.0   reviewed              us   
11               7.9        1.7     0.071    19.0   reviewed              us   
10               9.5        4.2     0.059    28.0   reviewed              us   
9                6.5        1.8     0.068    21.0   reviewed              us   
8                7.7        3.5     0.049    40.0   reviewed              us   
7                7.1        1.9     0.078    16.0   reviewed              us   
6                8.7        1.8     0.043    53.0   reviewed              us   
5                9.7        3.9     0.061    26.0   reviewed              us   
4                9.1        1.7     0.045    48.0   reviewed              us   
3                9.9        0.6     0.083    14.0   reviewed              us   
2               10.2        1.8     0.076    61.0   reviewed              us   
1                5.1        2.9     0.053    34.0   reviewed              us   
0                6.9        1.8     0.040    60.0   reviewed              us   

     magSource  
2401       hrv  
2400       hrv  
2399       hrv  
2398       hrv  
2397       hrv  
2396        us  
2395        us  
2394       hrv  
2393        us  
2392        us  
2391       hrv  
2390       hrv  
2389       hrv  
2388       hrv  
2387       hrv  
2386       hrv  
2385       hrv  
2384       hrv  
2383       hrv  
2382       hrv  
2381       hrv  
2380       hrv  
2379    iscgem  
2378        us  
2377        us  
2376        us  
2375        us  
2374       hrv  
2373       hrv  
2372       hrv  
...        ...  
29          us  
28          us  
27          us  
26          us  
25          us  
24          us  
23          us  
22          us  
21          us  
20          us  
19          us  
18          us  
17          us  
16          us  
15          us  
14          us  
13          us  
12          us  
11          us  
10          us  
9           us  
8           us  
7           us  
6           us  
5           us  
4           us  
3           us  
2           us  
1           us  
0           us  

[10092 rows x 22 columns]

First of all we will extract some simple data from this initial set. I would like to find out what the maximum magnitude earthquake which occured over this time period

In [123]:
monthset['mag'].max()
Out[123]:
6.7

We will then do the same for the second set. Over a much larger time period, we would expect a larger maximum

In [124]:
yearset['mag'].max()
Out[124]:
9.1

Task 3: Parse the collected data, and store it in an appropriate file format

One of the main issues with these datasets with regards to analysing the data is that the time is currently stored in string format. To make it easier to analyse this data going forward it is necessary to convert it now rather than later for each individual analysis. We will first do this on the monthset, followed by the yearset.

As the monthset is all contained in the same month and year, we have no need to add these columns

In [125]:
# Code to switch the time given by the API into a month format

print("monthset; Converting String Date and Time into Day and Date columns")

day = []
list = []
datelist = []
daylist = []

# Split up the time format
for i in range(0,len(monthset.index)):
    splitup = monthset.iloc[i]['time'].split('-')
    list.append(splitup)
    
# Split up based on 'T'
for i in range(0,len(monthset.index)):
    splitup1 = monthset.iloc[i]['time'].split('T')
    datelist.append(splitup1)
    
# Split into date
date = [x[0] for x in datelist]
    
timelist = [x[2] for x in list]
    
# Split up time format based on 'T' in order to access the day
for i in range(0,len(timelist)):
   daysplit = timelist[i].split('T')
   daylist.append(daysplit)
    
day = [x[0] for x in daylist]
for i in range(0,len(day)):
    day[i] = int(day[i])
    

print("monthset; Converted Successfully")
monthset; Converting String Date and Time into Day and Date columns
monthset; Converted Successfully

We then add these columns into the dataset

In [126]:
monthset['date'] = date
monthset['day'] = day

print("monthset; Columns added successfully")
monthset; Columns added successfully
In [127]:
# Code to switch the time given by the API into a month format

print("yearset; Converting String Date and Time into Day, Month, Year and Date columns")

# Temporary, used to store values before adding to Pandas
month = []
year = []
day = []
list = []
datelist = []
daylist = []

# Split up the time format
for i in range(0,len(yearset.index)):
    splitup = yearset.iloc[i]['time'].split('-')
    list.append(splitup)
    
# Split up based on 'T'
for i in range(0,len(yearset.index)):
    splitup1 = yearset.iloc[i]['time'].split('T')
    datelist.append(splitup1)
    
# Split into date
date = [x[0] for x in datelist]

# Split into month and convert to an integer
month = [x[1] for x in list if len(list) > 3]
for i in range(0,len(month)):
    month[i] = int(month[i])

# Split into year and convert to an integer
year = [x[0] for x in list if len(list) > 3]
for i in range(0,len(year)):
    year[i] = int(year[i])
    
    
timelist = [x[2] for x in list]
    
# Split up time format based on 'T' in order to access the day
for i in range(0,len(timelist)):
   daysplit = timelist[i].split('T')
   daylist.append(daysplit)
    
day = [x[0] for x in daylist]
for i in range(0,len(day)):
    day[i] = int(day[i])
    

print("yearset; Converted Successfully")
yearset; Converting String Date and Time into Day, Month, Year and Date columns
yearset; Converted Successfully
In [128]:
yearset['year'] = year
yearset['month'] = month
yearset['date'] = date
yearset['day'] = day

print("monthset; Columns added successfully")
monthset; Columns added successfully

Task 4: Apply any pre-processing steps to clean/filter/combine the data¶

The next step of this process is to use this data which has been collected in order for analysis. The nature and scale of data provided by this API is such that a simplified set of data might be much more efficient for data analysis and visualisation.

Some sample columns which I chose to include in this table are the date and day columns which we just split up the data into. We will also take the longitude and latitude which we will later use in order to display the data on simple maps.

We will also take the Magnitude and the Depth of the earthqauke which will be used as the main variables in our analysis of data. We will keep the location column aswell as it provides a brief description of the location of the earthquake.

In [129]:
simp_monthset = monthset[["date","latitude","longitude","mag","depth","place","day"]]
print("monthset; Simplified Dataframe Created")
monthset; Simplified Dataframe Created

We will now do the same with the yearset, this time making sure to add the month and year columns which we also added to the dataset

In [130]:
simp_yearset = yearset[["date","latitude","longitude","mag","depth","place","day","month","year"]]
print("yearset; Simplified Dataframe Created")
yearset; Simplified Dataframe Created

In order to display the data better, we will sort these events in chronological order. The index's will now be in reverse

In [131]:
simp_monthset = simp_monthset.sort_values(by="date")
simp_yearset = simp_yearset.sort_values(by="date")
print("; Sorted by date")
; Sorted by date

As another preprocessing measure, we will search these now simplified datasets for null values. This will help with the overall display of data, and we may find some entries which are of no value to us.

For any earthquake with a magnitude value of 0, we will delete this from the montlhy dataset as it will not help with our displaying of data.

As the 20 Year dataset accounts for a minimum magnitude already, we will not need to do this.

In [132]:
simp_monthset_nozero = simp_monthset[simp_monthset.mag > 0.0]
In [133]:
simp_monthset[1:5]
Out[133]:
date latitude longitude mag depth place day
15315 2019-09-01 54.019900 -166.346900 1.70 6.90 19km NE of Dutch Harbor, Alaska 1
15314 2019-09-01 38.901667 -122.911000 0.60 0.91 10km SW of Kelseyville, CA 1
15313 2019-09-01 35.972333 -117.674000 0.48 2.69 21km E of Little Lake, CA 1
15312 2019-09-01 36.550833 -121.141333 1.93 6.53 2km N of Pinnacles, CA 1

Task 5: Analyse and summarise the cleaned dataset

We now have two reduced datadryd which we can use to analyse the data further. The first way in which we will analyse this data is to create a scatter plot which will be used to compare the Magnitude of these earthqaukes with the day in which they occured..

Firstly, we will display this chart using a histogram, showing the distribution of the values of Magnitudees. I have chosen to use 70 bins in the analysis of this data. This is due to the fact that the highest value earthquake for this month is 6.7. This means that 70 bins will show the data roughly in steps of 0.1

In [134]:
plt.figure(figsize=(30, 9))
simp_monthset_nozero['mag'].hist(bins = 45)
plt.ylabel("Count",size=20)
plt.xlabel("Magnitude",size=20)
plt.title("Distribution of Magnitudes for Monthly Dataset",size=20)
plt.show()

Next up, we will analyse the second dataset also showing the distribution of magnitudes. As we have min magnitude of 5.5 and a maximum magnitude value of 9.1, we will choose 40 bins instead this time. These two graphs will be shown under different scales, but in theory shoul dbe somewhat similar in shape.

In [135]:
plt.figure(figsize=(30, 10))
simp_yearset['mag'].hist(bins = 40,color='red')
plt.ylabel("Count",size=20)
plt.xlabel("Magnitude",size=20)
plt.title("Distribution of Magnitudes for 20 Year Dataset",size=20)
plt.show()

The results of analysing these two datasets are very interesting. Firstly, the shape of the monthset graph is quite surprising. While it is true to say that the overall data is right-skewed, there is an interesting spike in magnitude between 4 and 5 on the Richter scale.

The Richter scale works in such a way that each increase of 1.0 in magnitude should result in about a 10 times stronger earthquake. As such it is very surprising that there is a spike in the data. It is a possibillity that this data is taken over too short of a time scale for the patterns that should have appeared to develop, but with over 15000 entries in the data this is very surprising.

The second graph is much more of what I had expected when I was analysing this data. This is a very classic right-skew graph and shows the distribution as the magnitude of the earthquakes increases. There seems to be a huge outlier in the earthquake of magnitude 9.1. Let's find out more.

In [136]:
simp_yearset.loc[simp_yearset['mag'] == simp_yearset['mag'].max()]
Out[136]:
date latitude longitude mag depth place day month year
2541 2004-12-26 3.295 95.982 9.1 30.0 2004 Sumatra - Andaman Islands Earthquake 26 12 2004
1861 2011-03-11 38.297 142.373 9.1 29.0 2011 Great Tohoku Earthquake, Japan 11 3 2011

It is interesting to note that the depth of these earthquakes are quite similar. I would like to find out later on if these two characteristics of an earthquake are somehow related

Visulation of these Datasets on a World Map

For the purpose of visualising this earthquake data on a world map, I have chosen to eliminate any earthquake under a magnitude of 2.5. According to the website http://www.geo.mtu.edu/, this is the threshhold for an earthquake to be felt, and not simply to be recorded by a seismograph.

In [137]:
simp_monthset_greater_25 = simp_monthset[simp_monthset.mag > 2.5]

We will now quickly check to see what the minimum magnitude currently is

In [138]:
simp_monthset_greater_25['mag'].min()
Out[138]:
2.51

Here is the further simplified data shown on a map of the world

In [139]:
# Change size of the map
plt.figure(figsize=(20, 40))

# Uses a Mollweide projection, with long 0 at position 0, and resolution 'high'
m = Basemap(projection='moll',lon_0=0,resolution='h')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='white')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.))
m.drawmeridians(np.arange(0.,420.,60.))
m.drawmapboundary(fill_color='white')
plt.title("Earthquake Data for September")

for i in range(0,len(simp_monthset_greater_25 .index)):
    long=simp_monthset_greater_25 .iloc[i]['longitude']
    lat=simp_monthset_greater_25 .iloc[i]['latitude']
    x,y = m(long,lat)
    m.plot(x,y,"ro",markersize=4)
    
plt.show()
C:\Users\u\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: MatplotlibDeprecationWarning: 
The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead.
  """

This is a very interesting diagram as it shows the distribution of the earthquakes around the world. As earthquakes occur most frequently along fault lines, it is interesting to see the amount of earthquakes which occur, for example, along the San Andreas Fault.

Another interesting way in which we can analyse this data is to map these earthquakes based on their strength. We will use a different colour code in order to display each one based upon their magnitude.

In [140]:
# This changes the size of the map
plt.figure(figsize=(20, 40))

# Uses a Mollweide projection, with long 0 at position 0, and resolution 'high'
m = Basemap(projection='moll',lon_0=0,resolution='h')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='white')
m.drawmapboundary(fill_color='white')
plt.title("Earthquake Data for September, with colour ranks")

for i in range(0,len(simp_monthset_greater_25 .index)):
    long=simp_monthset_greater_25 .iloc[i]['longitude']
    lat=simp_monthset_greater_25 .iloc[i]['latitude']
    x,y = m(long,lat)
    
    mag = simp_monthset.iloc[i]['mag']
    
    if((mag >= 2.5) & (mag < 3.5)):
        colour = "yo"
        size = 4
    elif((mag >= 3.5 ) & (mag < 4.5)):
        colour = "go"
        size = 4
    elif((mag >= 4.5)& (mag < 5.5)):
        colour = "bo"
        size = 4
    elif((mag >= 5.5)& (mag < 6)):
        colour = "mo"
        size = 8
    elif(mag >= 6):
        colour = 'ro'
        size = 8
    m.plot(x,y,colour,markersize=size)
    
plt.show()
C:\Users\u\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: MatplotlibDeprecationWarning: 
The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead.
  """

The Key for the above diagram is as follows:

Yellow = 2.5 - 3.5
Green  = 3.5 - 4.5 
Blue   = 4.5 - 5.5 
Magenta= 5.5 - 6.5
Red    = 6.5+

Compared to the first graphic, which displays all of the earthquakes with the same rank, this diagram is very interesting. Upon first glance, many areas which had seemed to be stricken with earthquakes were in fact simply very weak earthquakes, with most barely being felt. Parallels and Meridians have been removed for easier viewing of data.

According to http://www.geo.mtu.edu/UPSeis/magnitude.html, earthquakes from 2.5 to 5.4 on the Richter Scale are usually felt, but only cause 'minor damage'. What this means with reference to the diagram which can be found above is that over the month of September, only the earthquakes labelled with a Magenta dot on the map would have caused even minor damage to buildings and other structures. These have been made significantly larger than the other earthquakes in order to highlight them on the map.

Those earthquakes which are displayed using a red dot are even stronger again, with these having the possibillity to 'cause a lot of damage in populated areas'.

This will change significantly when we look at the larger earthquakes which have occured in the past.

Before Moving on to the 20 Year Set, we will take a quick look at the distribution of these earthquakes over north America using a refrain map

In [141]:
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt

# This changes the size of the map
plt.figure(figsize=(20, 40))

m = Basemap(width=12000000,height=9000000,projection='lcc',
            resolution=None,lat_1=45.,lat_2=55,lat_0=50,lon_0=-107.)
m.shadedrelief()
plt.title("Map of North America")


for i in range(0,len(simp_monthset_greater_25 .index)):
    long=simp_monthset_greater_25 .iloc[i]['longitude']
    lat=simp_monthset_greater_25 .iloc[i]['latitude']
    x,y = m(long,lat)
    
    mag = simp_monthset.iloc[i]['mag']
    
    if((mag >= 2.5) & (mag < 3.5)):
        colour = "yo"
        size = 4
    elif((mag >= 3.5 ) & (mag < 4.5)):
        colour = "go"
        size = 4
    elif((mag >= 4.5)& (mag < 5.5)):
        colour = "bo"
        size = 4
    elif((mag >= 5.5)& (mag < 6)):
        colour = "mo"
        size = 6
    elif(mag >= 6):
        colour = 'ro'
        size = 6
    m.plot(x,y,colour,markersize=size)
    
plt.show()

The Key for the above diagram is as follows:

Yellow = 2.5 - 3.5
Green  = 3.5 - 4.5 
Blue   = 4.5 - 5.5 
Magenta= 5.5 - 6.5
Red    = 6.5+

It is quite interesting to note the quantity of small earthquakes which occur, especially along the mountain ranges shown by the relief map. While only the earthquakes which are coloured either magenta or red are even felt by us, the Earth underneath us, which formed the mountains shown here is in constant motion

20 Year Data

Firstly, we will do the same as initially with the monthset, and plot the data simply based on where they occured, ignoring magnitude

In [142]:
# Change size of the map
plt.figure(figsize=(20, 40))

# Uses a Mollweide projection, with long 0 at position -120
# in order to display the Pacific Ring of Fire, and resolution 'high'
m = Basemap(projection='moll',lon_0=-120,resolution='h')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='white')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.))
m.drawmeridians(np.arange(0.,420.,60.))
m.drawmapboundary(fill_color='white')
plt.title("Earthquake Data for September")

for i in range(0,len(simp_yearset.index)):
    long=simp_yearset.iloc[i]['longitude']
    lat=simp_yearset.iloc[i]['latitude']
    x,y = m(long,lat)
    m.plot(x,y,"ro",markersize=4)
    
plt.show()
C:\Users\u\Anaconda3\lib\site-packages\ipykernel_launcher.py:6: MatplotlibDeprecationWarning: 
The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead.
  

What is very interesting in this map, but should not really be taken as a surpise, is that by plotting the earthquakes which have occured over the last 20 years, we almost perfectly draw the plates tectonics outlines of the world.

While it makes sense that most of the earthquakes which occur would be along plate boundaries, it is very interesting that there is a much greater concentration of these earthquakes along the convergent boundaries, where plates drive against each other. By plotting the earthquakes of the last 20 years, we can easily deduce whether a plate boundary is a convergent one or not

This can be seen in the map below

In [143]:
from IPython.display import Image
Image("https://geology.com/plate-tectonics/plate-boundary-map-780.jpg")
Out[143]:
In [144]:
# This changes the size of the map
plt.figure(figsize=(20, 40))

# Uses a Mollweide projection, with long 0 at position 0, and resolution 'high'
m = Basemap(projection='moll',lon_0=-120,resolution='h')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='white')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.))
m.drawmeridians(np.arange(0.,420.,60.))
m.drawmapboundary(fill_color='white')
plt.title("Earthquake Data for last 20 Years, with colour ranks")

for i in range(0,len(simp_yearset.index)):
    long=simp_yearset.iloc[i]['longitude']
    lat=simp_yearset.iloc[i]['latitude']
    x,y = m(long,lat)
    
    mag = simp_yearset.iloc[i]['mag']
    
    if((mag >= 5.5) & (mag < 7)):t
        colour = "go"
        size = 2
    elif((mag >= 7) & (mag < 8)):
        colour = "mo"
        size = 9
    elif(mag >= 8):
        colour = 'ro'
        size = 9
    m.plot(x,y,colour,markersize=size)
    
plt.show()
C:\Users\u\Anaconda3\lib\site-packages\ipykernel_launcher.py:5: MatplotlibDeprecationWarning: 
The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead.
  """

The Key for the above diagram is as follows:

Green   = 5.5 - 7
Magenta = 7 - 8
Red     = 8+ 


For this example I chose to use slightly less colours in order to display the data. Any earthquake which is coloured Magenta is considered a major earthquake. These cause serious damage. There are estimated to be around 20 of these each year.

Earthquakes coloured in red are considered great earthquakes. They can totally destroy communities near the epicentre. It is again very interesting to note just how powerful the earthquakes are around the pacific ring of fire

Like we did previously with the monthly set, I have chosen to further bring up a relief map of South America to show these earthquakes with reference to terrain which is on the ground. As earthquakes are a byproduct of boundary movement, we can expect to find that a lot of these earthquakes occur where the mountains have formed in the last millions of years

In [145]:
# This changes the size of the map
plt.figure(figsize=(10, 20))

from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
m = Basemap(projection='merc', lat_0 = -37, lon_0 = -71,
    resolution = 'h', area_thresh = 0.1,
    llcrnrlon=-85, llcrnrlat=-60,
    urcrnrlon=-33, urcrnrlat=10 )
m.drawcoastlines()
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.))
m.drawmeridians(np.arange(0.,420.,60.))
m.shadedrelief()
plt.title("Earthquake Activity, South America, Last 20 Years")


for i in range(0,len(simp_yearset.index)):
    long=simp_yearset.iloc[i]['longitude']
    lat=simp_yearset.iloc[i]['latitude']
    x,y = m(long,lat)
    
    mag = simp_yearset.iloc[i]['mag']
    
    if((mag >= 5.5) & (mag < 7)):
        colour = "go"
        size = 2
    elif((mag >= 7) & (mag < 8)):
        colour = "mo"
        size = 4
    elif(mag >= 8):
        colour = 'ro'
        size = 9
    m.plot(x,y,colour,markersize=size)
     
plt.show()
C:\Users\u\Anaconda3\lib\site-packages\ipykernel_launcher.py:10: MatplotlibDeprecationWarning: 
The dedent function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use inspect.cleandoc instead.
  # Remove the CWD from sys.path while we load stuff.

The Key for the above diagram is as follows:

Green   = 5.5 - 7
Magenta = 7 - 8
Red     = 8+ 

Analysis of 20 Year Data by Year

In order to get a greater insight into the actual data found in this dataset, it may be useful to group the 20 year data we have acquired into groups seperated by month and also by year. This will provide valuable ionformation which we can use to visualise using the built in graphing features of pandas.

In order to do this, we will further reduce the data that we are comparing. For this, we will simply take the magnitude and the depth of each earthquake, along with the year that it occured.

We can also use this monthly data to compare the month of september 2019, and where this would fit in and compare to the data provided [By examining the data whose mag > 5.5]

First off, we will chart the yearly mean depth of the earthquakes

In [146]:
yeardepth = simp_yearset[['depth','year']]
In [147]:
yeardepth.describe()
Out[147]:
depth year
count 10092.000000 10092.000000
mean 66.770931 2008.965517
std 126.706901 5.636795
min 0.000000 1999.000000
25% 10.000000 2004.000000
50% 24.000000 2009.000000
75% 44.625000 2014.000000
max 691.600000 2019.000000
In [148]:
meanyeardepth = yeardepth['depth'].mean()
In [149]:
meanyeardepth
Out[149]:
66.77093103448276

We now group them by year

In [150]:
yeardepth = yeardepth.groupby('year').mean()

Here are the mean values:

In [151]:
#yeardepth
In [152]:
yeardepth.plot(figsize=(10, 5))
plt.xticks(np.arange(1999, 2019, step=1))
plt.yticks(np.arange(30,100,step=5))
plt.plot([1999, 2019], [meanyeardepth, meanyeardepth], color='g', linestyle='--', linewidth=2)
plt.title("Mean Yearly Depth of Earthquakes since 1st September 1999")
plt.ylabel("Depth")
plt.xlabel("Time")
plt.show()

As we can see from this graph, there seems to be a lot of outliers in this set of data. The dashed green line is the overall mean for this data. We can see that the standard deviation is quite high in this sample. We will now do the same for the Magnitudes of Earthquakes and compare the results to one another

In [153]:
print("The standard deviation for this data is " + str(yeardepth.std()))
The standard deviation for this data is depth    10.855906
dtype: float64
In [154]:
yearmag = simp_yearset[['mag','year']]
In [155]:
yearmag['mag'].describe()
Out[155]:
count    10092.000000
mean         5.877379
std          0.427611
min          5.500000
25%          5.600000
50%          5.700000
75%          6.000000
max          9.100000
Name: mag, dtype: float64
In [156]:
meanyearmag = yearmag['mag'].mean()
In [157]:
yearmag = yearmag.groupby('year').mean()
In [158]:
#yearmag
In [159]:
yearmag.plot(figsize=(10, 5),color='red')
plt.yticks(np.arange(5.8, 6, step=0.02))
plt.xticks(np.arange(1999, 2019, step=1))
plt.plot([1999, 2019], [meanyearmag, meanyearmag], color='g', linestyle='--', linewidth=2)
plt.title("Mean Yearly Magnitude of Earthquakes since 1st September 1999")
plt.ylabel("Magnitude")
plt.xlabel("Time")
plt.show()
In [160]:
print("The standard deviation for this data is " + str(yearmag.std()))
The standard deviation for this data is mag    0.021564
dtype: float64

As we can see from this graph, the mean earthquake magnitude over the last 20 years has stayed very much the same. The standard deviation for this sample is very small. It might be interesting to plot both the depth and magnitude of these earthquakes in order to see if there is any correlation between years with a higher mean depth of earthquake, and magnitude

One possible way in which we could improve this dataset for analysis would be to remove outliers in the depths and magnitudes and repeat the graphing process. This may give a clearer result.

How this works is by eliminating any value from the set if it is outside 3 Standard Deviations of the mean.

In [161]:
outlierdepth = simp_yearset[['depth','year']]
In [162]:
reduced_outlier_depth=outlierdepth[np.abs(outlierdepth.depth-outlierdepth.depth.mean()) <= (3*outlierdepth.depth.std())]
In [163]:
reduced_outlier_depth = reduced_outlier_depth.groupby('year').mean()
In [164]:
reduced_outlier_depth.plot(figsize=(10, 5),color='green')
plt.xticks(np.arange(1999, 2019, step=1))
plt.yticks(np.arange(30,100,step=5))
plt.title("Mean Yearly Depth of Earthquakes within 3 Standard Deviations from Mean since 1st September 1999")
plt.ylabel("Depth")
plt.xlabel("Time")
plt.show()
In [165]:
print("The standard deviation for this data is " + str(reduced_outlier_depth.std()))
The standard deviation for this data is depth    4.694693
dtype: float64

While the mean depth is dramatically decreased in this dataset, I feel that this is an inaccurate depiction of the earthquake data

Analysis of 20 Year Data by Month

We will now attempt to do the same type of analysis but instead sorting by month instead of year. As the method is basically the same, I will try to do this in a much shorter way

In [166]:
monthdepth = simp_yearset[['depth','month']]
monthmag = simp_yearset[['mag','month']]

We will now group the mean values by month

In [167]:
monthdepth = monthdepth.groupby('month').mean()
monthmag = monthmag.groupby('month').mean()

The mean value for the entire dataset is shown as a dashed green line on the below diagrams

In [168]:
monthdepth.plot(figsize=(10, 5))
plt.xticks(np.arange(1, 12, step=1))
plt.yticks(np.arange(40, 80, step=5))
plt.plot([1, 12], [monthdepth['depth'].mean(), monthdepth['depth'].mean()], color='g', linestyle='--', linewidth=2)
plt.title("Mean Monthly Depth of Earthquakes since 1st September 1999")
plt.ylabel("Depth")
plt.xlabel("Time")
plt.show()

We will do the same for the magnitudes

In [169]:
monthmag.plot(figsize=(10, 5),color='red')
plt.xticks(np.arange(1, 12, step=1))
plt.yticks(np.arange(5.8, 6, step=0.02))
plt.plot([1, 12], [monthmag['mag'].mean(), monthmag['mag'].mean()], color='g', linestyle='--', linewidth=2)
plt.title("Mean Monthly Magnitude of Earthquakes since 1st September 1999")
plt.ylabel("Magnitude")
plt.xlabel("Time")
plt.show()

We will now calculate the standard deviation of both of these graphs

In [170]:
print("The standard deviation for Magnitude data is " + str(monthmag.std()))
print("The standard deviation for Depth data is " + str(monthdepth.std()))
The standard deviation for Magnitude data is mag    0.014059
dtype: float64
The standard deviation for Depth data is depth    6.959208
dtype: float64

As we can see from the both the data above and the graphs, there is a very small standard deviation where Magnitude is concerned. This could be attributed to the sample size of the data.

Despite this, there is still quite a large standard deviation in the depth of the earthquakes, although not quite as large as the previous data set showed. It would be very interesting to see if the two of these patterns both still hold using smaller sample or even larger sample sizes

Analysis of September 2019 Data

We have already used line plots to analyse the data contained in the 20 Year Dataset. We will now quickly analyse the monthly data, in order to see how this monthly data would have fit into the model created above.

In [171]:
septmeanmag=simp_monthset['mag'].mean()
In [172]:
septmeanmag
Out[172]:
1.4152302903246285
In [173]:
septmeandepth=simp_monthset['depth'].mean()
In [174]:
septmeandepth
Out[174]:
17.959037641148353

While these are the mean depth and mean magnitudes for the month of september, in order to compare these against our 20 year average, we will have to only choose the earthquakes with a magnitude of over 5.5.

In [175]:
comparison = simp_monthset[simp_monthset.mag >= 5.5]
In [176]:
comparison['mag'].mean()
Out[176]:
5.848484848484847
In [177]:
monthmag.plot(figsize=(10, 5),color='red')
plt.xticks(np.arange(1, 12, step=1))
plt.yticks(np.arange(5.8, 6, step=0.02))
plt.plot([0, 12], [comparison['mag'].mean(), comparison['mag'].mean()], color='g', linestyle='--', linewidth=2)
plt.title("Mean Monthly Magnitude of Earthquakes since 1st September 1999")
plt.ylabel("Magnitude")
plt.xlabel("Time")
plt.show()

As can be seen from the dashed green line on the graph, the mean magnitude for the month of September was below the monthly average for past 20 years

In [178]:
comparison['depth'].mean()
Out[178]:
112.53757575757577
In [179]:
monthdepth.plot(figsize=(10, 5))
plt.xticks(np.arange(1, 12, step=1))
plt.yticks(np.arange(40, 120, step=5))
plt.plot([0, 12], [comparison['depth'].mean(), comparison['depth'].mean()], color='g', linestyle='--', linewidth=2)
plt.title("Mean Monthly Depth of Earthquakes since 1st September 1999")
plt.ylabel("Depth")
plt.xlabel("Time")
plt.show()

As you can see from the dashed green line drawn on the above graph, the mean value for depth over the month of September was a huge outlier

Depth vs. Magnitude: Is there a correlation?

Firstly, we will check the correlation between the depth and the magnitude using the means for each year in the last 20 years, this is graphed below

In [180]:
yeardepthmag = simp_yearset[['depth','mag','year']]
In [181]:
yeardepthmag = yeardepthmag.groupby('year').mean()
In [182]:
yeardepthmag.plot.scatter(x="mag",y="depth",s=50,c='red')
plt.title("Scatter Plot of Magnitude vs Depth over past 20 years")
plt.show()

As we can see from this graph, there doesn't seem to be much of a correlation between these two attributes. We will calculate the correlation coefficient

In [183]:
yeardepthmag.corr(method ='pearson')
Out[183]:
depth mag
depth 1.000000 -0.305566
mag -0.305566 1.000000

There seems to be a negative weak linear relationship between the the mean yearly depth and mean yearly magnitude of the earthquakes that have occured across the world in the last 20 years

We will also compare the data for the each day over the month of September 2019

In [184]:
monthdepthmag = simp_monthset[['depth','mag','day']]

We will group these by day in order to graph them better

In [185]:
monthdepthmag = monthdepthmag.groupby('day').mean()
In [186]:
monthdepthmag.plot.scatter(x="mag",y="depth",s=50)
plt.title("Scatter Plot of Magnitude vs Depth per day over September 2019")
plt.show()

When mean magnitude per day is graphed against depth for the month of september, there looks to be some sort of correlation between the two. To put a value on this we will again find the correlation

In [187]:
monthdepthmag.corr(method ='pearson')
Out[187]:
depth mag
depth 1.000000 0.451493
mag 0.451493 1.000000

This indicates a positive weak-medium correlation between the two factors. This is very interesting that over a shorter period of time there is more of a correlation between the two characteristics.

One thing that may be influencing these statistics is the size of the earthquakes. Lets again do this scatter plot for the earthquakes above magnitude 5.5 in this set

In [188]:
monthdepthmag_greater_25 = simp_monthset[['depth','mag']]
In [189]:
monthdepthmag_greater_25 = monthdepthmag_greater_25[monthdepthmag_greater_25['mag'] >= 5.5]
In [190]:
monthdepthmag_greater_25.plot.scatter(x="mag",y="depth",s=50,color='magenta')
plt.title("Scatter Plot of Magnitude vs Depth per day (mag>=5.5) over September 2019")
plt.show()
In [191]:
monthdepthmag_greater_25.corr(method ='pearson')
Out[191]:
depth mag
depth 1.00000 0.22033
mag 0.22033 1.00000

There seems to be no correlation between the two earthquake characteristics when a filter of > 5.5 magnitude is applied to the dataset.

This is an extremely interesting result, as it indicates that with every earthquake in a time period, there may be a correlation between depth and magnitude, as shown by the analysis on the month of September. When only the earthquakes above 5.5 are taken into account, like in the 20 year dataset, there is little to no correlation, which explains our results above.

With this in mind, we may be able to use a regression line in order to predict the magnitude of an earthquake based on its depth, or vice versa. As we do not have a full dataset of every earthquake, I will use the month of September as the basis for this analysis. As it is done over such a short time frame it may be slightly inaccurate, but with a larger dataset it may be much more effective

In [192]:
import seaborn as seab
In [193]:
seab.lmplot(x='depth',y='mag',data=monthdepthmag,fit_reg=True)
Out[193]:
<seaborn.axisgrid.FacetGrid at 0x1d0bc29fa20>

This is the same scatterplot found above, but with a regression line fitted to it

2018: An Analysis

A way to go a step further with this project would be to collect the data from an entire year or mores worth of earthquakes and perform the same analysis as was done towards the end of this project. The scatterplots created could give an excellent insight into a possible correlation between the two earthquake characteristics.

For the final part of this project, I will quickly import the data for all of the earthquakes which occured in 2018. I will then quickly group these based on the month in which they occured. I will find the mean data for magnitude and depth and graph these, including a regression line.

Hopefully this will give us an answer to our question.

In [194]:
testset = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-1-01&endtime=2018-1-31")

for i in range(1,12):
    if i is 3 or 5 or 7 or 8 or 10 or 12:
        temp1 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
                                   +"-01&endtime=2018-"+str(i)+"-15")
        temp2 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
                                   +"-16&endtime=2018-"+str(i)+"-31")
    elif i is 4 or 6 or 9 or 11:
        temp1 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
                                   +"-01&endtime=2018-"+str(i)+"-15")
        temp2 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
                                   +"-16&endtime=2018-"+str(i)+"-30")
    elif i is 2:
        temp1 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
                                   +"-01&endtime=2018-"+str(i)+"-15")
        temp2 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
                                   +"-16&endtime=2018-"+str(i)+"-28")
    testset = pd.concat([testset,temp1])
    testset = pd.concat([testset,temp2])
    print("import of " + str(i) + " completed")


print("testet; Input complete")
print("testset; testset contains: " + str(len(testset)) + " entries")
import of 1 completed
import of 2 completed
import of 3 completed
import of 4 completed
import of 5 completed
import of 6 completed
import of 7 completed
import of 8 completed
import of 9 completed
import of 10 completed
import of 11 completed
testet; Input complete
testset; testset contains: 171891 entries

We now split the time into month data like above

In [195]:
month = []
list = []

# Split up the time format
for i in range(0,len(testset.index)):
    splitup = testset.iloc[i]['time'].split('-')
    list.append(splitup)
    
# Split into month and convert to an integer
month = [x[1] for x in list if len(list) > 3]
for i in range(0,len(month)):
    month[i] = int(month[i])
In [196]:
testset['month'] = month

We create a simplified set and make sure all elements have a value >0

In [197]:
simp_testset = testset[['month','mag','depth']]
simp_testset = simp_testset[simp_testset.mag > 0.0]
In [198]:
seab.lmplot(x='depth',y='mag',data=simp_testset,fit_reg=True)
Out[198]:
<seaborn.axisgrid.FacetGrid at 0x1d0c3fc24a8>
In [199]:
simp_testset['mag'].corr(simp_testset['depth'])
Out[199]:
0.3007603545582519

Tentative Conclusion

From the analysis of just a month's data, it seems that there may be a medium correlation between the magnitude of an earthquake and the depth which it occured inside the earth.

This is slightly contradicted by the data shown above from the analysis of 2018, where there is a weak correlation between the two elements. With a much larger sample size, we should now be able to give a more accurate answer to the question found above. First of all, I believe there is evidence here to prove correlation between the two characteristics.

While this may be only a weak correlation, a correlation co-effiecient of 0.3 shows that as magnitude increases, so does the depth for the earthquakes shown in this dataset. An even bigger sample size would be needed to say for definite, but I believe this is at least some evidence for the fact.

This project has been incredibly interesting to study, and would love to further explore this data in the future and see what conclusions I may extract